Welcome back to Pattern Recognition. So today we want to start talking about support vector
machines. Support vector machines are a very powerful classification technique that is
built on top of convex optimization. So you will see that it essentially builds on top
of everything that we've seen so far in this class and we will see that we can solve some
of the problems much more elegantly using the tricks that we will show you in the next
couple of videos.
Okay so today support vector machines and what is our motivation? Well again we want
to look into linear decision boundaries and we assume that we have two linearly separable
classes and then we want to compute a linear decision boundary that allows the separation
of the training data and that generalizes very well. This builds on some observations
by Wapnick and already in 1996 he could show that the optimal separating hyperplane separates
two classes and maximizes the distance to the closest point from either class and this then
results in a unique solution for the hyperplanes and in most cases better generalization. So
let's look into the type of problem that we want to solve. Here we have linearly separable
and non-separable cases. On the left hand side you can see we have the green and the
blue dots and they can be separated with just a single line so this is a separable case.
On the right hand side you see a non-separable case so you see that there are only two points
that have been switched and with these two points the problem suddenly becomes no longer
solvable with a single hyperplane that would separate the two clusters. So if you remember
the perceptron in the right hand case it would essentially iterate until infinity. So let's
stick first with the separable case on the left hand side. Now the problem here is you
see that there is quite some space between the two point clouds and this means that there
are quite a few different solutions that you can find and therefore we don't have a unique
solution just by aiming at separating the two classes. So one thing that we could do
in this case is for example to average the perceptron solutions. Now the idea that was
introduced by Vapnik is actually that you want to maximize the distance between the
points of the two classes and here we actually show a solution that is showing a hyperplane
so here you see this line with the normal vector alpha and this one is essentially maximizing
the distance between the two point sets. So on the left hand side we have the hard margin
problem and the hard margin problem we really require the two sets to be absolutely separable
and then there is the soft margin problem and then the soft margin problem we will use
a couple of more tricks to actually also allow then a misclassification. So here you can
see that there is one point that cannot be separated in the second point cloud and here
we can then introduce tricks in order to still be able to compute a unique decision boundary
and this is then called the soft margin problem because we allow some confusions that are
caused by the optimally separating hyperplane. Let's have a look at a bit of linear algebra.
We assume that we have an affine function that defines the decision boundary so we can
write this up simply as the normal vector alpha transpose x plus some bias alpha zero
and this then means that for any point x on the hyperplane we essentially have alpha of
x equals to zero. So if you're on the plane then you have this inner product plus alpha
zero and you were exactly equal to zero. Also that if you have two points that are on the
hyperplane then the following conditions have to hold. So if you have x one minus x two
and multiply them with alpha transpose then also they need to be zero. So this is also
a necessary condition for all points x one x two that are on the hyperplane and also
note that the actual normal vector of the hyperplane needs to be normalized with the
length. So alpha is of course a vector that points in the normal vector direction but
actually the normal vector you would have to divide by the length of alpha so here we
take the two norm. Note that this is very important in particular if you're considering
signed distances. So if you really want to compute signed distances to the hyperplane
then actually what you need to do is you have to scale with the normal vector to compute
Presenters
Zugänglich über
Offener Zugang
Dauer
00:14:48 Min
Aufnahmedatum
2020-11-11
Hochgeladen am
2020-11-11 14:37:41
Sprache
en-US
In this video, we explain the basic concept of the support vector machine.
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You